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Abstract — We derive lower bounds on the convergence speed 
of a widely used class of distributed averaging algorithms. In 
particular, we prove that any distributed averaging algorithm 
whose state consists of a single real number and whose (possibly 
nonlinear) update function satisfies a natural smoothness con- 
dition has a worst case running time of at least on the order of 
n 2 on a network of n nodes. Our results suggest that increased 
memory or expansion of the state space is crucial for improving 
the running times of distributed averaging algorithms. 

I. Introduction 

The goal of this paper is to analyze the fundamental 
limitations of a class of distributed averaging algorithms. 
These algorithms are message-passing rules for a collection 
of agents (which may be sensors, nodes of a communication 
network, or UAVs), each beginning with a real number, to 
estimate the average of these numbers using only nearest 
neighbor communications. Such algorithms are interesting 
because a number of sophisticated network coordination 
tasks can be reduced to averaging (see [13], [25], [1], [2], [6], 
[8], [9], [20], [23]), and also because they can be designed 
to be robust to frequent failures of communication links. 

A variety of such algorithms are available (see [22], [10], 
[18], [24], [15], [16], [17], [12], [26], [21]). However, many 
of these algorithms tend to suffer from a common disad- 
vantage: even when no link failures occur, their convergence 
times do not scale well in the number of agents. Our aim 
in this paper is to show that this is, in fact, unavoidable 
for a common class of such algorithms; namely, that any 
distributed averaging algorithm that uses a single scalar state 
variable at each agent and satisfies a natural "smoothness" 
condition will have this property. 

We next proceed to define distributed averaging algorithms 
and informally state our result. 

A. Background and basic definitions. 

Definition of local averaging algorithms: Agents 1, . . . , n 
begin with real numbers x%(0), . . . , x n (0) stored in memory. 
At each round t = 0,1,2,..., agent i broadcasts Xi(t) to 
each of its neighbors in some undirected graph G(t) — 
({1, . . . , n}, E(t)), and then sets Xi(t + 1) to be some 
function of Xi(t) and of the values Xi'(t),Xi»(t), ... it has 
just received from its own neighbors: 



Xi{t + 1) = fi,G(t){%i(t),Zi>(t),Xi»(t), . . .). 



(1) 



We require each fi,a(t) to be a differentiable function. Each 
agent uses the incoming messages Xi'(t),Xi»(t), . . . as the 
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arguments of f^GU) m some arbitrary order; we assume 
that this order does not change, i.e. if G(ti) = Gfo), then 
the message coming from the same neighbor of agent i is 
mapped to the same argument of fi t a{t) for t = t\ and 
t = ti. It is desired that 



1 " 

lim Xi(t) = - y^x^O), 

t— voo 11 * — » 



(2) 



i=l 



for every i, for every sequence of graphs G(t) having the 
property that 

the graph ({1, . . . , n}, U s >tE(s)) is connected for every t, 

(3) 

and for every possible way for the agents to map incoming 
messages to arguments of fi,a(t)- 

In words, as the number of rounds t approaches infinity, 
iteration (Q]) must converge to the average of the numbers 
xi(0), . . . , x n (0). Note that the agents have no control 
over the communication graph sequence G(i), which is 
exogenously provided by "nature." However, as we stated 
previously, every element of the sequence G(t) must be 
undirected: this corresponds to bidirectional models of com- 
munication between agents. Moreover, the sequence G(t) 
must satisfy the mild connectivity condition of Eq. (f3]), which 
says that the network cannot become disconnected after a 
finite period. 

Local averaging algorithms are useful tools for information 
fusion due to their efficient utilization of resources (each 
agent stores only a single number in memory) as well as 
their robustness properties (the sequence of graphs G(t) 
is time-varying, and it only needs to satisfy the relatively 
weak connectivity condition in Eq. (f3]) for the convergence 
in Eq. (0 to hold). As far as the authors are aware, no other 
class of schemes for averaging (e.g., flooding, fusion along 
a spanning tree) is known to produce similar results under 
the same assumptions. 

Remark: As can be seen from the subscripts, the update 
function fiGCt) is allowed to depend on the agent and on the 
graph. Some dependence on the graph is unavoidable since 
in different graphs an agent may have a different number 
of neighbors, in which case nodes will receive a different 
number of messages, so that even the number of arguments 
of fi t c(t) w iH depend on G(t). It is often practically desired 
that /i,G(t) depend only weakly on the graph, as the entire 
graph may be unknown to agent i. For example, we might 
require that f^att) he completely determined by the degree 
of i in G(t). However, since our focus is on what distributed 
algorithms cannot do, it does not hurt to assume the agents 



have unrealistically rich information; thus we will not assume 
any restrictions on how fi t a(t) depends on G(t). 

Remark: We require the functions f^att) to be smooth, for 
the following reason. First, we need to exclude unnatural 
algorithms that encode vector information in the infinitely 
many bits of a single real number. Second, although we 
make the convenient technical assumption that agents can 
transmit and store real numbers, we must be aware that in 
practice agents will transmit and store a quantized version 
of Xi(t). Thus, we are mostly interested in algorithms that 
are not disrupted much by quantization. For this reason, we 
must prohibit the agents from using discontinuous update 
functions fi^c(t)- F° r technical reasons, we actually go a 
little further, and prohibit the agents from using non-smooth 
update functions fi t G(t)- 

B. Examples. 

In order to provide some context, let us mention just a few 
of the distributed averaging schemes that have been proposed 
in the literature: 

1) The max-degree method [18] involves picking e(t) 
with the property e(t) < l/(d(t) + 1), where d(t) is 
the largest degree of any agent in G{t), and updating 
by 

Xi(t + 1) =Xi(t)+e(t) {xj{t) - Xi(t)) . 

i£Ni{t) 

Here we use iVj(t) to denote the set of neighbors 
of agent i in G(t). In practice, a satisfactory e(t) 
may not be known to all of the agents, because this 
requires some global information. However, in some 
cases a satisfactory choice for e(t) may be available, 
for example when an a priori upper bound on d(G(t)) 
is known. 

2) The Metropolis method [24] involves setting ey(i) to 
satisfy e l3 {t) < mm(l/(cZ;(t) + l),l/{dj{t) + 1)), 
where di (t) , dj (t) are the degrees of agents i and j 
in G(t), and updating by 

Xi(t + 1) = x t (t) + e «(*) few - *<(*)) ■ 

3) The load-balancing algorithm of [17] involves updating 
by 

Xi (t + 1) = Xi (t) + -*<(*))> 
iENi(t) 

where a.y(i) is determined by the following rule: each 
agent selects exactly two neighbors, the neighbor with 
the largest value above its own and with the smallest 
value below its own. If i,j have both selected each 
other, then ay (t) = 1 /3; else ay (t) = 0. The intuition 
comes from load-balancing: agents think of Xi{t) as 
load to be equalized among their neighbors; they try 
to offload on their lightest neighbor and take from their 
heaviest neighbor. 



We remark that the above load-balancing algorithm is not 
a "local averaging algorithm" according to our definition 
because Xiit + 1) does not depend only on Xi(t) and its 
neighbors; for example, agents i and j may not match up 
because j has a neighbor k with xt(t) > Xj(t), By contrast, 
the max-degree and Metropolis algorithm are indeed "local 
averaging algorithms." 

For each of the above algorithms, it is known that Eq. (f2]i 
holds provided the connectivity condition in Eq. © holds. A 
proof of this fact for the load-balancing algorithm is implicit 
in [17], and for the others it follows from the results of [14], 
[3]. 

C. Our contribution 

Our goal is to study the worst-case convergence time 
over all graph sequences. This convergence time may be 
arbitrarily bad since one can insert arbitrarily many empty 
graphs into the sequence G(t) without violating Eq. (01. To 
avoid this trivial situation, we require that there exist some 
integer B such that the graphs 

({l,...,n}Mi + kB B E(k)) (4) 

are connected for every integer k. 

Let x(t) be the vector in 5R™ whose ith component is Xi(t). 
We define the convergence time T(n, e) of a local averaging 
algorithm as the time until "sample variance" 

n ( n \ 2 

V(x(t))=J2[xi(t)--Y x j(°U 
i=i \ n j=i J 

permanently shrinks by a factor of e, i.e., V(x(t)) < 
eV(x(0)) for all t > T(n, e), for all possible n-node graph 
sequences satisfying Eq. (O, and all initial vectors x(0) for 
which not all Xi(0) are equal; T(n, e) is defined to be the 
smallest number with this property. We are interested in how 
T(n, e) scales with n and e. 

Currently, the best available upper bound for the conver- 
gence time is obtained with the load-balancing algorithm; in 
[17] it was proven that 

T(n,e) < Cn 2 B log-, 
e 

for some absolute constant C.We are primarily interested in 
whether its possible to improve the scaling with n to below 
n 2 . Are there nonlinear update functions fi t a(t) which speed 
up the convergence time? 

Our main result is that the answer to this question is 
"no" within the class of local averaging algorithms. For such 
algorithms we prove a general lower bound of the form 

T(n,e) > cn 2 B log-, 

€ 

for some absolute constant c. Moreover, this lower bound 
holds even if we assume that the graph sequence G(t) is the 
same for all t; in fact, we prove it for the case where G(t) 
is a fixed "line graph." 

'By "absolute constant" we mean that C does not depend on the problem 
parameters n, B,e. 



II. Formal statement and proof of main result 

We next state our main theorem. The theorem begins by 
specializing our definition of local averaging algorithm to the 
case of a fixed line graph, and states a lower bound on the 
convergence time in this setting. 

We will use the notation 1 to denote the vector in W 
whose entries are all ones, and to denote the vector 
whose entries are all 0. The average of the initial values 
xi(0), . . . , x„(0) will be denoted by x. 

Theorem 1: Let /i, f n be two differentiable functions 
from K 2 to K, and let /2, /3, . . . , f n -i be differentiable 
functions from M 3 to R. Consider the dynamical system 

xi(t + l) = fi{x 1 (t),x 2 (t)), 

Xi(t + l) = fi(xi(t),Xi-i(t),Xi + i(t)), i = 2,...,n-l, 
x„(t + l) = / n (x n _i(t),x„(t)). (5) 

Suppose that there exists a function r(n, e) such that 

Mt)-xi\\ 2 

||x(0)-xl|| 2 ' 

for all n and e, all t > r(n, e), and all initial conditions 
xi(0), . . . , x„(0) for which not all Xj(0) are equal. Then, 

.2 1 



Let / (without a subscript) be the mapping from R ra to 
itself that maps x(t) to x(t + 1) according to Eq. ©. 
Lemma 1: f(al) = al, for any a £ K. 
Proof: Suppose that x(0) = al. Then, the initial 
average is a, so that 

al = lim x(t) — lim x(t + 1) = lim f(x(t)). 

We use the continuity of / to get 

al = /(limx(t)) = /(al). 



r(n,e) > — log-, 
30 e 



(6) 



for all e > and n > 3. 



Remark: The dynamical system described in the theorem 
statement is simply what a local averaging algorithm looks 
like on a line graph. The functions fx , /„ are the update 
functions at the left and right endpoints of the line (which 
have only a single neighbor), while the update functions 
/2, /a, . . . ,/„-i are the ones used by the middle agents 
(which have two neighbors). As a corollary, the convergence 
time of any local averaging algorithm must satisfy the lower 
bound T(n,e) > (l/30)n 2 log(l/e). 

Remark: Fix some n > 3. A corollary of our theorem 
is that there are no "local averaging algorithms" which 
compute the average in finite time. More precisely, there 
is no local averaging algorithm which, starting from initial 
conditions x(0) in some ball around the origin, always results 
in x(t) — xl for all times t larger than some T. We will 
sketch a proof of this after proving Theorem 1 . By contrast, 
the existence of such algorithms in slightly different models 
of agent interactions was demonstrated in [7] and [19]. 

A. Proof of Theorem [7J 

We first briefly sketch the proof strategy. We will begin by 
pointing out that must be an equilibrium of Eq. (0; then, 
we will argue that an upper bound on the convergence time 
of Eq. (0 would imply a similar convergence time bound 
on the linearization of Eq. (01 around the equilibrium of 0. 
This will allow us to apply a previous 51 (n 2 ) convergence 
time lower bound for linear schemes, proved by the authors 
in [21]. 



For i, j = 1, . . . , n, we define = , 



A = /'(0) 



Lemma 2: For any integer k > 1, 



and the matrix 
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a\2 
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lim 

z-s-0 



|/ fc (x)-A fc x|l 2 
1Mb 



= 0, 



where f k refers to the fc-fold composition of / with itself. 

Proof: The fact that /(0) = implies by the chain rule 
that the derivative of f k at x = is A k . The above equation 
is a restatement of this fact. ■ 
Lemma 3: Suppose that x T l = 0. Then, 

lim A m x = 0. 



Proof: Let B be a ball around the origin such that for 
all x £ B, with x 7^ 0, we have 

||/ fc (x)-^x|| 2 ^ 1 



X 2 



< -, for k = r(n, 1/2). 



Such a ball can be found due to Lemma [2] Since we can 
scale x without affecting the assumptions or conclusions of 
the lemma we are trying to prove, we can assume that x £ B. 
It follows that that for k = r(n, 1/2), we have 

||A fc x-/ fe (x) + / fc (x)|| 2 



\A k x\\-. 
IMI2 



Ml 2 



< 1 KW 

" 4 + ||x|| 2 
1 1 

* 4 + 2 
3 

< -. 

~ 4 

Since this inequality implies that A k x £ B, we can apply 
the same argumet argument recursively to get 

lim (A k ) m x = 0, 

m— >oo 

which implies the conclusion of the lemma. ■ 
Lemma 4: Al = 1. 



Proof: We have 

A1 = lim /(^±^M = lim w = 

h^O h h 

where we used Lemma [TJ ■ 
Lemma 5: For every vector x £ M. n , 

lim A k x — xl, 

k^oo 

where x = {Yh=i x i) In- 
Proof: Every vector x can be written as 

x — xl + y, 

where y T l = 0. Thus, 

lim A k x = lim A k (xl + y) — xl + lim A k y = xl, 

k— >oo k— >oo k— >oo 

where we used Lemmas [3] and [4] ■ 

Lemma 6: The matrix A has the following properties: 

1) &ij — whenever \i — j\ > 1. 

2) The graph G = ({1, . . . , n}, E), with E = 

| ay 7^ 0}, is strongly connected. 

3) Al = 1 and 1 T A = 1. 

4) An eigenvalue of A of largest modulus has modulus 1. 

5) A has an eigenvector v, with real eigenvalue A £ (1 — 
^,1), such that v T l = 0. 

Proof: 

1) True because of the definitions of / and A. 

2) Suppose not. Then, there is a nonempty set S C 
{1, . . . ,n} with the property that ay = whenever 
i £ S and j £ <S C . Consider the vector cc with 
Xj = for i £ S, and Xj — 1 for j e S' c . Clearly, 

(V n )Ei^ > °. but (^ fc;E ) i = for t e 5 1 . This 
contradicts Lemma |5] 

3) The first equality was already proven in Lemma 0] For 
the second, let b = 1 T A. Consider the vector 



z = lim A k ei 



(7) 



where ej is the ith unit vector. By Lemma |5j 



l T e l 1 
-1 = -1. 

n n 



On the other hand, 



lim A k e. t = lim A fe+1 ej = lim A k (Aei). 
Applying Lemma |5] again, we get 

z = i — ^1 = -1, 

n n 

where 6j is the ith component of b. We conclude that 
bi = 1; since no assumption was made on i, this 
implies that 6=1, which is what we needed to show. 
4) We already know that Al = 1, so that an eigen- 
value with modulus 1 exists. Now suppose there is 
an eigenvalue with larger modulus, that is, there is 
some vector x £ C n such that Ax = Xx and 
|A| > 1. Then lirrifc || ^4 fc a;|| 2 = oo. By writing 



x = x r0 ai + ^imaginary, we immediately have that 
A k x = A k x rca \ + iA k x imaginary But by Lemma [5] 
both A k x rca \ and A fc Xi ma ginary approach some finite 
multiple of 1 as k — > oo, so ||^4 fe a;||2 is bounded above. 
This is a contradiction. 
5) The following fact is a combination of Theorems 4. 1 
and 6.1 in [21]: Consider an n x n matrix A such 
that dij = whenever \i — j\ > 1, and such that the 



graph with edge set 



0} is connected. 



Let Ai, A2, ... be its eigenvalues in order of decreasing 
modulus. Suppose that Ai = 1, Al = 1, and ir T A = 
tt t , for some vector n satisfying ^ 7T; = 1, and iri > 
l/(Cn) for some positive C and for all i. Then, A 
has a real eigenvalue ir0 (1 — 6C/n 2 , 1). Furthermore, 
the corresponding eigenvector is orthogonal to 1, since 
right-eigenvectors of a matrix are orthogonal to left- 
eigenvectors with different eigenvalues. 
By parts 1-4, all the assumptions of the result from [21] 
are satisfied with ir = 1/n and C = 1, thus completing 
the proof of the lemma. 



Remark: An alternative proof of part 5 is possible. One can 
argue that parts 1 and 3 force A to be symmetric, and that 
Lemma [5] implies that the elements ay must be nonnegative. 
Once these two facts are established, the results of [4] will 
then imply an eigenvalue has to lie in (1 — c/n 2 , 1), for a 
certain absolute constant c. 

Proof of Theorem [J Let v be an eigenvector of A with the 
properties in part 5 of Lemma [6] Fix a positive integer k. 
Let e > and pick x 7^ to be a small enough multiple of 
v so that 



\\f k (x)-A k (x) 
IMI2 



< e. 



This is possible by Lemma [2] Then, we have 



ll/ fe (*)l| 2 U k x\ 



e> 1 



*2 



*2 



Using the orthogonality property x T l = 0, we have x = 0. 
Since we placed no restriction on e, this implies that 



inf 



\\f k {x)-xl\\ 2 =inf ||/ fe (z)|| 2 t . (> 



> 1- 



x^o ||a:-a;l||2 ^0 \\ x \\ 2 \ n- 
Plugging k = r(n, e) into this equation, we see that 



1 



6 



r (n,e) 



< e. 



Since n > 3, we have 1 — 6/n 2 £ (0, 1), and 



r(n,e) > 



1 



log(l - 6/n 2 ) 



loge. 



2 The reference [21] proves that an eigenvalue lies in (1 — ciC/n 2 ,l) 
for some absolute constant c\. By tracing through the proof, we find that 
we can take c\ = 6. 



Now using the bound log(l — a) > 5 (a — 1) for a € [0,2/3), 
we get 

n 2 1 
r(n,e)> -log-. 

q.e.d. 

Remark: We now sketch the proof of the claim we made 
earlier that a local averaging algorithm cannot average in 
finite time. Fix n > 3. Suppose that for any x(0) in some 
ball B around the origin, a local averaging algorithm results 
in x(t) = xl for all t > T. 

The proof of Theorem 1 shows that given any k, e > 0, 
one can pick a vector v(e) so that if x(0) = v(e) then 
V(x(k))/V(x(0)) > (1 - 6/n 2 ) fc - e. Moreover, the vectors 
v(e) can be chosen to be arbitrarily small. One simply picks 
k = T and e < (l-6/n 2 ) fe to get that x(T) is not a multiple 
of 1; and furthermore, picking v(e) small enough in norm 
to be in B results in a contradiction. 

Remark: Theorem Q] gives a lower bound on how long we 
must wait for the 2-norm \\x(t) — xl\\2 to shrink by a factor 
of e. What if we replace the 2-norm with other norms, 
for example with the oo-norm? Since Boo{Q,r/*Jn) C 
62(0, r) C Boo(0,r), it follows that if the oo-norm shrinks 
by a factor of e, then the 2-norm must shrink by at least 
y/ne. Since e only enters the lower bound of Theorem Q] 
logarithmically, the answer only changes by a factor of log n 
in passing to the oo-norm. A similar argument shows that, 
modulo some logarithmic factors, it makes no difference 
which p-norm is used. 

III. Conclusions 

We have proved a lower bound on the convergence time 
of local averaging algorithms which scales quadratically in 
the number of agents. This lower bound holds even if all the 
communication graphs are equal to a fixed line graph. Our 
work points to a number of open questions. 

1) Is it possible to loosen the definition of local averaging 
algorithms to encompass a wider class of algorithms? 
In particular, is it possible to weaken the requirement 
that each fi y c(t) be smooth, perhaps only to the re- 
quirement that it be piecewise-smooth or continuous, 
and still obtain a VL{n 2 ) lower bound? 

2) Does the worst-case convergence time change if we 
introduce some memory and allow Xi(t + 1) to depend 
on the last k sets of messages received by agent il 
Alternatively, there is the broader question of how 
much is there to be gained if every agent is allowed to 
keep track of extra variables. Some positive results in 
this direction were obtained in [11]. 

3) What if each node maintains a small number of update 
functions, and is allowed to choose which of them 
to apply? Our lower bound does not apply to such 
schemes, so it is an open question whether its possible 
to design practical algorithms along these lines with 
worst-case convergence time scaling better than n 2 . 
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